17 research outputs found

    A Semi-Empirical Model of PV Modules Including Manufacturing I-V Mismatch

    Get PDF
    This paper presents an analysis of the impact of manufacturing variability in PV modules when interconnected into a large PV panel. The key enabling technology is a compact semiempirical model, that is built solely from information derived from datasheets, without requiring extraction of electrical parameters or measurements. The model explicits the dependency of output power on those quantities that are heavily affected by variability, like short circuit current and open circuit voltage. In this way, variability can be included with Monte Carlo techniques and tuned to the desired distributions and tolerance. In the experimental results, we prove the effectiveness of the model in the analysis of the optimal interconnection of PV modules, with the goal of reducing the impact of variability

    C-NMT: A Collaborative Inference Framework for Neural Machine Translation

    Get PDF
    Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence-to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach

    Energy-efficient adaptive machine learning on IoT end-nodes with class-dependent confidence

    Get PDF
    Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for 'easy' inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach

    Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks

    Get PDF
    In modern data centers, storage system failures are major contributors to downtimes and maintenance costs. Predicting these failures by collecting measurements from disks and analyzing them with machine learning techniques can effectively reduce their impact, enabling timely maintenance. While there is a vast literature on this subject, most approaches attempt to predict hard disk failures using either classic machine learning solutions, such as Random Forests (RFs) or deep Recurrent Neural Networks (RNNs). In this work, we address hard disk failure prediction using Temporal Convolutional Networks (TCNs), a novel type of deep neural network for time series analysis. Using a real-world dataset, we show that TCNs outperform both RFs and RNNs. Specifically, we can improve the Fault Detection Rate (FDR) of ≈ 7.5% (FDR = 89.1%) compared to the state-of-the-art, while simultaneously reducing the False Alarm Rate (FAR = 0.052%). Moreover, we explore the network architecture design space showing that TCNs are consistently superior to RNNs for a given model size and complexity and that even relatively small TCNs can reach satisfactory performance. All the codes to reproduce the results presented in this paper are available at https://github.com/ABurrello/tcn-hard-disk-failure-prediction

    Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

    Get PDF
    Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-The-Art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy

    Improving PPG-based Heart-Rate Monitoring with Synthetically Generated Data

    Get PDF
    Improving the quality of heart-rate monitoring is the basis for a full-time assessment of people’s daily care. Recent state-of-the-art heart-rate monitoring algorithms exploit PPG and inertial data to efficiently estimate subjects’ beats-per-minute (BPM) directly on wearable devices. Despite the easy-recording of these signals (e.g., through commercial smartwatches), which makes this approach appealing, new challenges are arising. The first problem is fitting these algorithms into low-power memory-constrained MCUs. Further, the PPG signal usually has a low signal-to-noise ratio due to the presence of motion artifacts (MAs) arising from movements of subjects’ arms. In this work, we propose using synthetically generated data to improve the accuracy of PPG-based heart-rate tracking using deep neural networks without increasing the algorithm’s complexity. Using the TEMPONet network as baseline, we show that the HR tracking Mean Absolute Error (MAE) can be reduced from 5.28 to 4.86 BPM on PPGDalia dataset. Noteworthy, to do so, we only increase the training time, keeping the inference step unchanged. Consequently, the new and more accurate network can still fit the small memory of the GAP8 MCU, occupying 429 KB when quantized to 8bits

    Bioformers: Embedding Transformers for Ultra-Low Power sEMG-based Gesture Recognition

    Get PDF
    Human-machine interaction is gaining traction in rehabilitation tasks, such as controlling prosthetic hands or robotic arms. Gesture recognition exploiting surface electromyographic (sEMG) signals is one of the most promising approaches, given that sEMG signal acquisition is non-invasive and is directly related to muscle contraction. However, the analysis of these signals still presents many challenges since similar gestures result in similar muscle contractions. Thus the resulting signal shapes are almost identical, leading to low classification accuracy. To tackle this challenge, complex neural networks are employed, which require large memory footprints, consume relatively high energy and limit the maximum battery life of devices used for classification. This work addresses this problem with the introduction of the Bioformers. This new family of ultra-small attention-based architectures approaches state-of-the-art performance while reducing the number of parameters and operations of 4.9 ×. Additionally, by introducing a new inter-subjects pre-training, we improve the accuracy of our best Bioformer by 3.39 %, matching state-of-the-art accuracy without any additional inference cost. Deploying our best performing Bioformer on a Parallel, Ultra-Low Power (PULP) microcontroller unit (MCU), the GreenWaves GAP8, we achieve an inference latency and energy of 2.72 ms and 0.14 mJ, respectively, 8.0× lower than the previous state-of-the-art neural network, while occupying just 94.2 kB of memory

    Ultra-compact binary neural networks for human activity recognition on RISC-V processors

    Get PDF
    Human Activity Recognition (HAR) is a relevant inference task in many mobile applications. State-of-the-art HAR at the edge is typically achieved with lightweight machine learning models such as decision trees and Random Forests (RFs), whereas deep learning is less common due to its high computational complexity. In this work, we propose a novel implementation of HAR based on deep neural networks, and precisely on Binary Neural Networks (BNNs), targeting low-power general purpose processors with a RISC-V instruction set. BNNs yield very small memory footprints and low inference complexity, thanks to the replacement of arithmetic operations with bit-wise ones. However, existing BNN implementations on general purpose processors impose constraints tailored to complex computer vision tasks, which result in over-parametrized models for simpler problems like HAR. Therefore, we also introduce a new BNN inference library, which targets ultra-compact models explicitly. With experiments on a single-core RISC-V processor, we show that BNNs trained on two HAR datasets obtain higher classification accuracy compared to a state-of-the-art baseline based on RFs. Furthermore, our BNN reaches the same accuracy of a RF with either less memory (up to 91%) or more energy-efficiency (up to 70%), depending on the complexity of the features extracted by the RF

    Prediction of All-Cause Mortality Following Percutaneous Coronary Intervention in Bifurcation Lesions Using Machine Learning Algorithms

    Get PDF
    Stratifying prognosis following coronary bifurcation percutaneous coronary intervention (PCI) is an unmet clinical need that may be fulfilled through the adoption of machine learning (ML) algorithms to refine outcome predictions. We sought to develop an ML-based risk stratification model built on clinical, anatomical, and procedural features to predict all-cause mortality following contemporary bifurcation PCI. Multiple ML models to predict all-cause mortality were tested on a cohort of 2393 patients (training, n = 1795; internal validation, n = 598) undergoing bifurcation PCI with contemporary stents from the real-world RAIN registry. Twenty-five commonly available patient-/lesion-related features were selected to train ML models. The best model was validated in an external cohort of 1701 patients undergoing bifurcation PCI from the DUTCH PEERS and BIO-RESORT trial cohorts. At ROC curves, the AUC for the prediction of 2-year mortality was 0.79 (0.74–0.83) in the overall population, 0.74 (0.62–0.85) at internal validation and 0.71 (0.62–0.79) at external validation. Performance at risk ranking analysis, k-center cross-validation, and continual learning confirmed the generalizability of the models, also available as an online interface. The RAIN-ML prediction model represents the first tool combining clinical, anatomical, and procedural features to predict all-cause mortality among patients undergoing contemporary bifurcation PCI with reliable performance

    Energy-efficient deep learning inference on edge devices

    No full text
    The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input
    corecore